AD-A012  465 

EMPIRICAL  SAMPLING  STUDY  OF  A GOODNESS 
OF  FIT  STATISTIC  FOR  DENSITY  FUNCTION 
ESTIMATION 

P.  A.  W.  Lewis,  et  al 

Naval  Postgraduate  School 


Prepare  d for: 

National  Science  Foundation 


March  1975 


DISTRIBUTED  BY: 


National  Technical  Information  Service 
U.  S.  DEPARTMENT  OF  COMMERCE 


ADA012  465 


^^09130 


'1 


NPS55LW75031 

NAVAL  P0$T6RADUATE  SCHOOL 

Monterey,  California 


Approved  for  public  release;  distribution  unlimited. 

RepfocJuceH  by 

NATIONAL  TECHNICAL 
INFORMATION  SERVICE 

U 5 O*»oofinient  of  Con'mefce 
Sp»fntjfi»*IH  VA  ?2^5I 


UNCLASSIFIED 

SECUWITV  CLASiiriCATION  OF  THIS  PAGE  (lUltt  OM*  EmwtrfJ 


REPORT  DOCUMENTATION  PAGE 

READ  INSTRUCTIONS 
BEFORE  COMPLETINCi  FORM 

1.  REFOI^^  NUMBER  2.  GOVT  ACCESS'ON  NO. 

NPS55LW75031 

3.  RECIFIENT'S  CATALOG  NUMBER 

4.  TITLE  rand  SubllrlaJ 

Empirical  Sampling  Study  of  a Goodness 
of  Fit  Statistic  for  Density  Function 
Estimation 

S.  TYPE  OF  REPORT  A PERIOD  COVERED 

Technical  Report 

• . PERFORMING  ORG.  REPORT  NUMBER 

7.  AOTHORraJ 

P.  A.  W.  Lewis,  L.  H.  Liu, 

D.  W.  Roginson  and  M.  Rosenblatt 

B.  CONTRACT  OR  GRANT  NUMBERrcJ 

AG-476 

S.  FERFORMINO  ORGANIZATION  NAME  AND  ADDRESS 

10.  PROGRAM  element.  PROJECT,  TASK 
AREA  b WORK  UNIT  NUMBERS 

11.  CONTROLLING  OFFICE  NAME  AND  ADDRESS 

12.  REPORT  DATE 

March  1975 

IS.  NUMBER  OF  pages 

sio 

14.  monitoring  agency  name  a ADORESSri/  dlllint  Inm  ContrelllnS  Otile*) 

IB.  security  class,  (at  fhic  rcpori; 

Unclassified 

ISa.  declassification/ DOWN  GRADING 

schedule 

It.  DISTRIBUTION  STATEMENT  (ol  thit  RaperlJ 

Approved  for  public  release;  distribution  unlimited. 

17.  distribution  statement  (ot  (Aa  abalraci  ntfd  In  Block  30,  II  dlllntnl  Iroat  Knpotl) 

IS.  sufflementary  notes 

It.  KEY  WORDS  (Conlimn  on  rararaa  alda  II  naeaaaan'  ond  Idtnillr  by  block  nuaiborj 

Goodness  of  fit  statistic  Histogram  estimates 

Density  function  estimation  Kolmogorov-Smirnov  test 
Empirical  sampling 

10.  ABSTRACT  fConllnuo  on  rooorao  aldo  II  noeoaaary  and  Idanfliy  by  block  nuaiborj 

The  distribution  of  a measure  of  the  distance  between  a 
probability  density  function  and  its  estimate  is  examined 
through  empirical  sampling  methods.  The  estimate  of  the 
density  function  is  that  proposed  by  Rosenblatt  using  sums  of 
weight  functions  centered  at  the  observed  values  of  the  random 
variables.  The  weight  function  in  all  cases  was  triangular, 
but  both  uniform  and  Cauchy  densities  were  tried  for  different 

DD  , 


FONM 
JAN  71 


1473  COITION  OF  I NOV  SI  It  OBSOLETE  ^ 
S/N  0I02-014-660I  I / 


SECURITY  CLASSIFICATION  OF  THIS  FAOE  (mtm  Oala  InlaradJ 


UNCLASSIFIED 

LLumry  CLASSIFICATION  OF  THIS  FAOCrWhan  OMa  CnlaraS) 


Section  20  continued. 

sample  sizes  and  bandwidths.  The  simulated  distributions  look 
as  if  they  could  be  approximated  by  Gamma  distributions,  in 
many  cases.  Some  assessment  can  also  be  made  of  the  rate  of 
convergence  of  the  moments  and  the  distribution  of  the  measure 
to  the  limiting  moments  and  distribution,  respectively. 


//  UNCLASSIFIED 

SeeuRITV  CLAUIFICATION  OF  THIS  FAOerOTian  Data  Bnfnd) 


NAVAL  POSTGRADUATE  SCHOOL 
Monterey,  California 


Rear  Admiral  Isham  Linder 
Superintendent 


Jack  R.  Bor 8 ting 
Provost 


The  work  reported  herein  was  supported  in  part  by  the  Office 
of  Naval  Research  and  the  National  Science  Foundation,  AG-476. 

Reproduction  of  all  or  part  of  this  report  is  authorized. 

Prepared  by: 

Q-  C^Ia.A 

Peter  A.  W.  Lewis,  Professor 
Department  of  Operations  Research 
and  Administrative  Sciences 

^ j 

„ ^ / L.  H.  Liu 


P (UL 

D.  W.  Robinson 


Reviewed  by: 


M.  Rosenblatt 


Released  by: 


1.  INTRDDaCTION 


There  are  several  recently  proposed  classes  of 

enpirical  probability  density  function  C1r4,5,7]  all 

generally  considered  to  be  superior  to  the  classical 

histogram  estimates.  The  clasr.  considered  in  this  paper  is 

based  on  independent  observations^  i.e.  X ,X  are 

12  n 

independent  and  identically  distributed  random  variables 

vith  continuous  unknown  density  function  f (x) . The  method 

used  to  estimate  f (x)  is  that  proposed  by  Rosenblatt; 

denoting  the  estimate  by  f (x) , we  define 

n 


f (X) 
n 


ffBTSr 


9 


where  H (u)  is  a bounded  non-negative  integrable  weight 
function  vith 


H (u) du  = 1 


# 


and  b(n)  is  a positive  bandwidth  function  which  tends  to 
zero  as  n — > « , but  is  such  that  o[b(n>  ] = 1 / n.  Thus  we 


night  have  b (n)  'V' 


-1/2 

n , for  example. 


He  note  that  all  estimates  of  this  form  are  themselves 
density  functions  for  a given  set  of  observations;  that  is, 


f (X)  ^ 0, 
n 


(x)dx  = 1 . 


Since  the  X *s  are  random  variables,  f (x)  is  a continuous 

j n 


parameter  stochastic  process,  but  it  is 
non-stationary. 


clearly 


1 


The  estinate  f (x)  can  be  shown  to  be  locally  biased 
n 

foe  any  value  of  x under  relatively  nild  conditions  [4]. 

Our  object  in  this  paper  is  to  investigate  a global  measure 

of  how  good  f (x)  is  as  an  estimate  of  £ (x) . The  measure 

n 

was  originally  proposed  by  Bickel  and  Rosenblatt  |2]  and  is 
given  by 


/•  rf  (xj  - f (x),2 

6(n)  = / ax  . 

Since  the  value  of  g (n»  will  vary  with  each  realization  of 

X / it  is  a statistic  or  function  of  the  n random 

1 n 

variables.  A possible  application  for  such  a statistic 

would  be  in  goodness-of-f it  type  tests,  in  an  analogous 
manner  to  the  more  familiar  Kolmogorov-Smirnov  test. 

Bickel  and  Rosenblatt  [2]  have  established  that  if 
-2/9 

b(n)  = o[n  ] as  n -->  “ and  if  a (x)  is  a bounded, 

piecewise  smooth  integrable  function  then 

b(n)  ^"^^[nb(n)  y*[  f ^ (x) -f  (x)  ]2a  (x)  dx  - Ax  J'k(z)zAz] 

is  asymptotically  normally  distributed  with  zero  mean  and 
variance 

2W<^>  (0)  y* a (X)  2f  (X)  2dx  , 

as  n -->  “ , where  W<^>(0)  is  the  fourth  convolution  of  w 

with  itself.  Thus,  B (n)  has  an  asymptotically  normal 
distribution,  regardless  of  the  underlying  density  f (x) . 

A problem  in  this  situation  is  that,  unlike  the 
Kolmogorov-Smirnov  test  statistic,  the  statistic  B (n)  is 


2 


not  distribution-free.  Further,  its  exact  distribution  for 
any  finite  value  of  n does  not  seen  to  be  mathematically 
tractable.  He  thus  exaiined  sone  representative  cases 
through  simulation,  hoping  that  S (n)  would  be  fairlv  robust 
with  rapid  convergence  to  the  asymptotic  distribution.  It 
was  also  hoped  that  the  simulations  would  cast  light  on 
these  conjectures  and  perhaps  suggest  sone  unexpected 
results. 
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2.  SIHOLATION 


The  pclnary  object  of  the  sinulation  was  to  investigate 
the  distribution  of  the  statistic  B(n) : 


6(n) 


(t)  - 

— fm — 


dx 


$ 


over  a suitable  range  of  integration.  He  performed 
simulations  with  synthetic  sampling  from  both  uniform  and 
Cauchy  distributions;  the  triangular  weight  function 


was  used 
difference 


«(u) 


1 - l«i| 
0 


to  evaluate  f (x) 


» if  |u|  ^ 1 

t otherwise 

in  both  cases.  He  found  little 


n 

as  far  as  B(n)  was  concerned  between  the 


triangular  and  other  "smoother”  quadratic)  weight 

functions  for  our  samples  of  from  100  to  1500  deviates. 


A.  UNIFORM  RANDOM  VARIABLES 


In  the  case  of  uniform  (0,1)  random  variables,  we  have 


f(x) 


0 < X < 1 
otherwise  . 


Thus,  3 (n)  becomes. 


B (n) 


^ ^l-b(n) 
y b(n) 


[f  (X) 
n 


1]2dx  . 


(2.1) 


The  limits  of  integration  are  from  b(n)  to  1 > b(n) 

instead  of  from  0 to  1 to  avoid  the  marked  bias  of  f (x) 

n 

near  0 and  1.  As  long  as  b (n)  < x < 1-b(n)  , though,  f (x) 

n 


4 


is  unbiased: 


ECf  (X)  1 
n 


1 ri  - dy 

qnTj  x-b  (n)  t -^BTnP 

1 dy  - r 

[nr  V x-b(n)  J x-b(n) 

A W i 

l7ir  ) 

* ^hr  I - Blsr  • r>hr  1 


Blnr 


y*x*b(n| 


Also,  for  the  sane  range  of  x, 

».rt£__(x,  ] . V,r  [ ] 

" E»B{sr* 

rx  - X , 

' HBTnr^ 

* nBTnr^  ^fl  " [fo'^^winT^  ] 1 ’ 


Since  f (X)  is  a piecewise  linear  function  when  a 
n 

triangular  weight  function  is  used,  the  integral  in  (2.1) 

can  be  evaluated  in  principle  but  the  woric  becomes 

prohibitive  for  even  noderate  sample  sizes.  We  thus 

approximated  the  integral  using  Simpson's  rule  with  100 

equal  subintervals.  The  results  were  found  to  be 

satisfactory  in  the  sense  that  the  value  did  not  change 

appreciably  when  a finer  grid  (up  to  500  subintervals)  was 

used.  In  general,  we  found  that  a larger  sanple  size 

required  a finer  grid;  apparently  the  value  of  f (x)  changes 

• n 
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■ore  rapidly  over  a snail  interval  when  n is  large. 

He  used  three  different  bandvidths  in  the  uniforn  case: 
1/2  1/2 

3 / n , 1 / n and  1 / n.  For  each  bandwidth  sanple 

sizes  of  100,  200,  500,  1000  and  1500  were  investigated  so 
that  a total  of  15  experiments  were  carried  out.  Each 
experiment  consisted  of  2000  independent  replications  each 
of  which  resulted  .in  the  calculation  of  a single  value  of 
0 (n)  using  (2.1).  The  replications  for  a given  experiment 
were  divided  into  five  sections  of  400  observations  each  so 
that  variability  of  the  simulation  results  could  be  assessed 
between  sections. 

Besides  the  400  observed  values  of  $ (n) , the  computer 
output  for  each  of  the  75  sections  included  a histogram,  an 
eapirical  log-survivor  function  plot,  an  empirical  CDF  plot 
and  a normal  probability  plot.  A histogram  and  an  empirical 
log-survivor  plot  were  also  computed  for  the  pooled  sample 
of  2000  for  each  experiment.  These  plots  are  all  reproduced 
in  reference  [3];  some  of  the  more  interesting  cases  are 
included  in  Section  4. 


It  was  found  that  a better  picture  of  the  distribution 
of  the  data  resulted  when  the  empirical  density  function  of 
the  3 (n) 's  was  plotted  over  the  histogram  plot.  A fairly 
vide  bandwidth  was  needed  to  suppress  large  fluctuations  in 


f (X);  it  was  found  that  b(n) 


R / n 


1/2 


was  a fairly 


robust 


n 

choice.  (R  denotes 
■inimum  value]  of  the 
Figures  in  Section  4 
this  bandwidth  and  the 


the  sample  range  [maximum  value  - 
0 (n)  sample.)  The  solid  lines  in  the 
are  empirical  density  estimates  using 
triangular  weight  function. 
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B.  CAUCHY  RANDOn  VARIABLES 


The  Cauchy  density  function  is 


t(t) 


wnix^r 


He  used  the  same  density  estimator  as  in  the  uniform  case: 


f (X) 
n 


nBTnr 


§ 


and  again  the  triangular  weight  function.  We  chose  a range 
of  integration  (-3, + 3)  : 


3 (n) 


[f  (X)  - f (X)  ]2 

n 

FTxf 


dx  . 


This  range  comprises  80i(  of  the  probability  mass  for  this 
distribution.  Again,  Simpson’s  rule  was  used  to  approximate 
the  integral;  in  this  case  a grid  of  600  subintervals  was 
selected  after  examining  100,  300,  600  and  900  subinterval 
grids . 


The  Cauchy  distribution  was  chosen  because  for  finite  n 

f (X)  has  a bias  component;  this  component  usually  decreases 
n 

with  bandwidth  for  a fixed  value  of  n,  although  the 

pointwise  variance  of  f (x)  increases  with  decreasing 

n 

bandwidth.  It  seems  likely  that  the  variance  of  3 (n)  would 

also  decrease  under  these  conditions,  as  indeed  it  was 
observed  to  do. 

Three  bandwidths  were  also  employed  in  the  Cauchy  case: 
1/2  1/2  1/2 

1 / n , 3 / n and  20  / n , the  last  one  representing 

I 

a case  in  which  bias  in  the  estimator  f (x)  plays  a major 

n 
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role  in  the  distribution  of  8 (n) . The  sane  five  sample 

sizes  were  used  here  for  each  bandwidth  as  were  used  for  the 
uniform  simulations;  output  from  the  fifteen  Cauchy 
experiments  was  obtained  just  as  in  the  uniform  case. 
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3.  TABULAR  RESULTS  AND  GANHA  FITS 


Using  the  asymptotic  result  obtained  by  Bickel  and 
Rosenblatt  [5],  for  a uniform  random  variable  the  quantity 

-1/2  fl-b(n) 

b(n)  (nb(n)  I If  (x)  - 1|  «dx  - l 1-2b(n)  ] 

jb(n)  n 

is  asymptotically  normally  distributed  with  mean  0 and 
variance 


W(u)zdu) 


2W<*>  (0)  [1-2b(n)  ] 


as  n — > » if  nb(n)  — >»  and  b(n) 


-2/9 
o(n  ) 


For  the 


triangular  weight  function, 


/■ 


V (U)  2du  = ^ 


and  the  fourth  convolution 

zero,  is  302/630. 


of  H with  itself  at 


Prom  the  above  expressions,  we  get 


Comparisons  of  the  simulated  values  for  tha  uniform 
experiments  with  the  conjectured  ones  are  tabulated  in  Table 
III.1  (means)  and  Table  III. 2 (variances).  Especially  for 
small  bandwidth  the  agreement  between  the  asymptotic  and 
simulated  variances  is  very  good  even  for  small  n (n  = 100)  . 
The  same  is  true  for  expected  value,  although  convergence  is 
slower  than  for  the  variance  and  again  slower  for  large 
bandwidth. 
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r 


TABLE;  II  1. 1 Comparison  of  ostimatcJ  moan  values  and  asympiotic 
mean  values  of  B(n]  for  different  bandwidths  and  sample  sizes. 


n 

b(n)=3/»^ 

IHS(n)) 

li(6(n)V(l-2l)(n)j 

Conjectured 

100 

.3000 

.0089 

.0222 

.0127 

200 

.2121 

.0090 

.0157 

.0109 

500 

.1342 

.0073 

.0099 

.0075 

1000 

.0949 

.0057 

.0070 

.0058 

1500 

.0775 

.0048 

.0057 

.0051 

b(n)=l/t^ 

100 

.1000 

.0533 

.0667 

.0583 

200 

.0707 

.0405 

.0471 

.0415 

500 

.0447 

.0271 

.0298 

.0269 

1000 

.0316 

.0197 

.0211 

.0197 

1500 

.0258 

.0163 

< 

.0172 



.0168 

TABLE  1 1 1. 2 Comparison  of  estimated  standard  deviation  values  and 
asymptotic  standard  deviation  values  of  6(n)  for  different  band- 
widths  and  sample  sizes. 


n 

b(n)=3//n 

cT(B(n)) 

Conjectured 

Computer  output 

100 

.3000 

.0113 

.0283 

.0115 

200 

.2121 

.0081 

.0141 

.0088 

500 

.1342 

.0046 

.0063 

.0047 

1000 

.0949 

.0029 

. 0036 

.0030 

1500 

.0775 

.0022 

.0026 

.0023 

b(n)=l/»^ 

100 

.1000 

.0277 

.0346 

.0315 

200 

.0707 

.0171 

! ■ .0199 

1 

.‘0189 

500 

.0447 

.0088 

! .0097 

. 0092 

1000 

.0053 

.0037 

.0030 

1500 

■■ 

.0040 

. 0042 

.0043 
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In  contrast  to  the  nonents,  the  distribution  of  0 (n) 
converges  very  slovly.  The  complete  results  (reference 
[3])  reveal  that  the  histograms  and  empirical  density 
functions  of  the  3(n)'s  are  all  skewed  to  the  right;  see 
Figures  IV. 1 to  IV. 9 for  examples. 

The  form  of  the  histograms  as  well  as  the  log-survivor 
plots  suggested  that  the  0(n)  statistic  is  approximately 
Gamma (9, k)  distributed,  where  the  Gamma  density  is  given  by 

k-1  -x/0 

f(x;  lc,«,  = . 

and  the  mean  and  variance  are 

E[X]  = k9; 

Var[X]  = k02  . 

Accordingly,  estimates  X and  0 of  k and  0 for  each 

experiment  were  obtained  from  the  sample  of  2030  8 (n) 's. 

Shenton  and  Bowman's  almost  unbiased  estimators  for  the 
Gamma  distribution  [6]  were  used;  these  give  reasonable 
results  when  k > 0.5,  as  in  this  case.  The  estimate  values 
are  tabulated  in  Taole  III. 3;  also  tabulated  are  estimates 

of  the  standard  deviation  of  1c  and  0 which  were  obtained 

from  the  five  sections  in  each  experiment.  A parametric 
density  estimate  is  thus  obtained  for  the  $ (n)  sample;  it 

nay  be  compared  with  the  non-parametric  estimate  f (x)  by 

n 

examining  the  graphs  in  Section  4,  where  the  Gamma  density 
function  is  plotted  with  a dashed  line. 
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I ABLE  III. 3 Estlnated  Pacaietecs  for  Fitted  Gavna 
stribution  for  B(n). 


DISTRIBUTIO!!  b(n)  n K 


Gf 


UNIFORM 


CAUCHY 


1 / /n 


3 / /R 


1 / n 


1 / /n 


3 / /n 


20  / /n 


100 

3.969 

3.01390 

200 

± 0.206 

± .00095 

5.780 

3.00715 

500 

± 0.659 
8.881 

± .00095 
3.00311 

1000 

1 0.8S9 

± .00029 

13.011 

0.00153 

1500 

± 0.796 
17.316 

± .00008 
0.00095 

± 1.467 

± .00008 

100 

1.153 

3.00967 

200 

± 0.048 

± .00058 

1.718 

0.00588 

500 

t 0.174 

± .00078 

2.707 

3.00281 

1000 

± 0.241 

± 0.00026 

4.028 

3.00145 

± 0.241 

± .00007 

1500 

5.248 

0.00096 

± 0.423 

± .00008 

100 

40.337 

3. 01616 

200 

± 2.555 

± .00117 

39.511 

0.01675 

500 

± 2.347 

± .00106 

33.649 

0.01958 

1000 

, ± 1.820 

± .00111 

32.033 

0.02059 

± 3.305 

± .00244 

1500 

31.712 

0.02088 

± 1.999 

± .00124 

100 

22.362 

0.01745 

200 

t 1.488 

± .00114 

32.305 

0.00864 

500 

± 2.022 

± .00054 

60. 147 

0.00293 

1000 

± 4.009 

± .00022 

79.  897 

0.0C157 

1500 

± 6.608 

± .00014 

101. 100 

0.00102 

± 7.783 

± .00007 

100 

9.272 

0.01331 

200 

1 0.406 

± .00062 

12.744 

0. 00709 

500 

± 0.645 
20.701 

± .00037 
3.00277 

1000 

± 1.673 

± .00022 

29.303 

3.00140 

1500 

± 2.541 

* . 00012 

34.265 

0.00099 

± 3.963 

± .00010 

100 

7.  103 

0.00776 

200 

± 0.217 

t .00035 

4.  144 

3.00619 

500 

± 0.069 

± .00009 

3.445 

3.00312 

1000 

± 0.  161 

± .00016 

4.211 

0.00152 

1500 

± 0.357 

± .00009 

5.385 

0.00095 

± 0.335 

± .00005 
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4,  GRAPHICAL  RESOLTS  AND  GENERAL  DISCUSSION 


The  graphs  for  the  following  experinents  have  been 


reproduced 

from  [ 3 ] because 

they  give 

the  greate 

St  insight 

into  the  ( 

distribution  of 

g (n)  ; these 

graphical 

results  are 

■ore  informative  than  the 
Ganma  fits  of  the  previous 

tabulated 

Section. 

means,  variances  and 

Figure 

Random  Variable  n 

b(n) 

If 

4.1 

Uniform 

200 

, 1/2 
3/n 

1/2 

1/n 

1/2 

1/n 

1.718 

4.2 

Uniform 

500 

8.881 

4.3 

Uniform 

1500 

17.316 

4.4 

Uniform 

200 

1/n 

1/2 

39.511 

4,5 

Cauchy 

100 

1/n 

1/2 

3/n 

1/2 

20/n 

1/2 

3/n 

1/2 

22.362 

4.6 

Cauchy 

100 

9.272 

4.7 

Cauchy 

1500 

5.385 

4.8 

Uniform 

1500 

5.248 

4.9 

Uniform 

100 

1/n 

3.969 

In  interpreting  the  graphs  we  can 

be  guided 

by  crude 

heuristics.  In  the  case  of  a density  estimate  f (x)  with 

n 

bandwidth  b(n)  there  is  dependence  within  a range  of  order 

b(n)  and  an  approach  to  independence  for  points  separated  by 
a distance  of  order  larger  than  b(n).  Thus  in  the  case  of 
uniform  random  variables  the  integral  g (n)  could  be  thought 
of  as  having  the  equivalent  of  the  order  of  [ 1-2b  (n)  ]/b (n) 
independent  summands.  In  the  first  case  (Figure  4.1;  n=200, 

b(n)=3//n,  K=1.713)  we  obtain 

(1  - 3/2/10)  / [3/(10/2)  ] = 2.71  . 

This  is  rather  small  so  that  one  does  not  expect  a good 
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Gaussian  fit.  We  give  X from  the  previous  Section  since  21c 

may  be  interpreted  as  an  equivalent  number  of  degrees  of 

freedom;  the  larger  the  fitted  R,  the  closer  ve  are  to 

normality.  In  a loose  sense  it  is  clear  that  a gamma  fit  is 
likely  to  be  more  appropriate  and  this  is  confirmed  by 
looking  at  the  graphs. 


In  the  second  case  (Figure  4.2;  n=500,  b(n)=1//n, 

Rs8.881)  ve  have 


(1  - /2/10)  10/2  = 12.14  , 

which  is  a bit  larger.  It  is  interesting  to  note  that  the 
estimated  (smoothed)  density  function  of  g(n)  gives  us 
greater  insight  apparently  in  all  cases.  Here  we  see  the 
beginning  of  an  approach  to  asymptotic  normality  though  it 
is  still  suggested  that  a Gamma  fit  might  be  appropriate. 

The  next  case  (Figure  4.3;  n=1500,  b(n)=1//n,  R=17.316)  with 

[ 1 - 1/(5/T3)  ] 10/To  = 36.73 

shows  a closer  approach  to  normality.  It  may  be  seen  that 

the  major  departure  between  the  parametric  and 

non-parametric  density  estimates  occurs  in  the  vicinity  of 

the  mode  where  f (x)  tends  to  fluctuate  about  the  true 

n 

value.  The  fit  in  the  tails  appears  excellent  in  all  cases. 


The  next  uniform  case  (Figure  4.4;  n=200,  b(n)=1/n, 
R=39.511)  is  strictly  speaking  outside  the  range  of  results 
suggested  by  the  paper  of  Bickel  and  Rosenblatt  [2].  Here 

f (X)  is  asymptotically  compound  Poisson  rather  than 
n 
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asyiptot ically  nornal.  Nonetheless  we  notice  that  it  looks 
as  if  a Gaussian  fit  would  be  very  good  and  this  is 
consistent  with  the  magnitude  of  our  crude  index 

(1  - .01)  200  = 190  . 

It  would  be  interesting  for  someone  to  prove  the  suggested 
asymptotic  normality. 

In  the  simulation  of  sampling  from  a uniform 

distribution,  the  density  estimator  has  no  bias.  To 
investigate  the  effect  of  bias,  we  repeated  the  uniform 
experiments  for  Cauchy-distributed  random  variables, 
integrated  over  the  range  -3  + b(n)  to  3-b(n).  The  first  case 

(Figure  4.5;  n=100,  b(n)=1//n,  lc=22.362)  has  index 

(6  - .2)  10  = 58 

and  one  notices  that  a Gaussian  fit  looks  very  good.  The 

next  case  (Figure  4.6;  n=100,  b(n)=3//n,  )f=9.272)  has  index 

(6  - .6)  10/3  = 17.66  , 

and  a Gaussian  fit  looks  fair  but  not  good.  In  the  last 
Cauchy  case  one  expects  substantial  bias  (Figure  4.7; 

n=1500,  b(n)=20//n,  K=5.385)  and  the  crude  index  is 

(6  - 4//T5)  /T5/2  = 9.6  2 . 

A Gamma  fit  is  suggested.  Altogether  the  effects  of  bias 
don't  seem  to  be  that  extreme  when  sampling  from  the  Cauchy 
distribution  but  this  may  be  due  to  the  fact  that  the  Cauchy 
density  is  a very  smooth  function. 

The  last  two  cases  involve  sampling  from  the  uniform 
distribution  again  but  with  different  sample  sixes  and 

bandwidths.  Figure  4.8  is  for  n=1500,  b(n|  =3//n  and 

K=5.240,  while  Figure  4.9  is  for  n=100  and  b(n)=1//n  for 

which  )c=3.969. 
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The  problea  in  using  6 (n)  as  a aeasure  of  g3odness  of 
fit  in  the  non-liniting  Sanaa  case  is  to  deternine  k and  9. 
If  one  wishes  to  fit  the  Sanaa  distribution  using  the  nethod 
of  aonents,  one  can  use  the  fact  that  the  mean  and  variance 
of  0 (n)  should  be  approxinately  (on  asymptotic  grounds) 
H<2>(0)  and  b(n)H(*>(0),  respectively.  3ne  night  then  use 

* * 

8 * k 

gcTT-jar 

as  estimates  of  k and  9.  The  results  in  Section  3 suggest 
that  this  procedure  should  produce  adequate  results  except 
when  there  is  appreciable  bias  in  the  density  function 
estiaate. 
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Figure  4.1.  Distribution  of  the  statistic  3 (n)  for  a 

uniform  random  variable  with  n = 203  and  bandwidth  3 / /IT. 

The  solid  line  shows  the  Rosenblatt  empirical  density 
function  of  the  3(n)*s  while  the  dashed  line  is  a fitted 

Gamma  density  function  with  Tc  = 1.718  and  9 = .00388. 
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Figure  4.2.  Distribution  of  the  statistic  B (n)  for  a 

uniforn  randoa  variable  with  n = 500  and  bandwidth  1 / «/TI. 

The  solid  line  shows  the  Rosenblatt  empirical  density 
function  of  the  3 (n) *s  while  the  dashed  line  is  a fitted 

Ganna  density  function  with  1(  = 0.881  and  5 = .0031  1. 
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Figure  4.4.  Distribution  of  the  statistic  6(n)  for  a 
uniform  random  variable  vith  n = 200  and  bandwidth  1 / n. 
The  solid  line  shows  the  Rosenblatt  empirical  density 
function  of  the  g (n) 's  while  the  dashed  line  is  a fitted 

Gamma  density  function  with  It  = 39.511  and  0 = .01675. 
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Figure  4,5.  Distribution  of  the  statistic  6 (n)  for  a 

Cauchy  random  variable  with  n = 100  and  bandwidth  1 / ✓?!. 

The  solid  line  shows  the  Rosenblatt  empirical  density 
function  of  the  8 (n) »s  while  the  dashed  line  is  a fitted 

Gamma  density  function  with  15  = 22.352  and  5 = .01745. 
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Figure  4.6.  Distribution  of  the  statistic  8 (n)  for  a 

Cauchy  random  variable  vith  n = 100  and  bandwidth  3 / v/n. 

The  solid  line  shows  the  Rosenblatt  empirical  density 
function  of  the  S(n)'s  while  the  dashed  line  is  a fitted 

Ganna  density  function  with  TL  - 9.272  and  9 = .01331. 
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Figure  4.7.  Distributioo  of  the  statistic  g (n)  for  a 

Cauchy  random  variable  with  n = 1500  and  bandwidth  20  / /n. 
The  solid  line  shows  the  Rosenblatt  empirical  density 
function  of  the  3 (n)'s  while  the  dashed  line  is  a fitted 

Gamma  density  function  with  R = 5.385  and  B = .00095. 
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Figure  4.8.  Distribution  of  the  statistic  S (n)  for  a 

uniforn  random  variable  with  n = 1500  and  bandwidth  3 / /n. 
The  solid  line  shows  the  Rosenblatt  empirical  density 
function  of  the  g (n) 's  while  the  dashed  line  is  a fitted 

Gamma  density  function  with  If  = 5.248  and  ff  = .00096. 
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Figure  4.9.  Distribution  of  the  statistic  6 (n)  for  a 

uniform  random  variable  with  n = 100  and  bandwidth  1 / /n. 

The  solid  line  shows  the  Rosenblatt  empirical  density 
function  of  the  B(n)'s  while  the  dashed  line  is  a fitted 

Gamma  density  function  with  1?  = 3,969  and  9 = .01390. 


nujz  OMUiin 


RBFEBENCES 


[1]  Bartlett,  d.S.,  (1963| . Statistical  estimation  of  density 

functions.  Sankh^a,  Sec.  A25,  p.  2U5-254. 

[2]  Bickel,  P.J.  and  Rosenblatt,  K.,  (1973).  On  some  global 

measures  of  the  deviations  of  density  function  estimates. 
2hS  2f  Hathematical  Statistics,  v.  1,  p.  1D71-1095. 

[3]  Liu,  L.H.,  (1974).  Empirical  sampling  investigation  of  a 
global  measure  of  fit  of  probability  density  functions.  N.S. 
Thesis,  Naval  Postgraduate  School,  Monterey. 

[4]  Rosenblatt,  H.,  (1956).  Remarks  on  some  non-parametric 

estimates  of  a density  function.  The  Annals  of  Mathematical 
Statistics,  v.  27, 

[5]  Rosenblatt,  M.,  (1971).  Curve  estimates.  Xllt  Annals  of 

Hathematical  statistics,  v.  42. 

[6]  Shenton,  L.R.,  and  Bowman,  K.O.,  (1973).  Comments  on  the 

Gamma  distribution  and  uses  in  rainfall  data.  Third 
£2ELfSE§!l£®  22  EEl^ability  and  Statistics  in  Atmospheric 
Science.  AMS. 

[7]  Wegman,  E.J.,  (1972).  Non-parametric  probability  density 

estimtion:  I.  A summary  of  available  methods.  Tech nometrics, 

V.  14. 


26 


